-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[LoopInterchange] Update the direction of undistributed loop to EQ #78951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: None (ShivaChen) ChangesIt could be an alternative way of #77719 to enable interchange for the following case. isDirectionNegative() is the function to detect Full diff: https://github.com/llvm/llvm-project/pull/78951.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index 1bce9aae09bb26..7d25049e03692f 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -290,7 +290,7 @@ FullDependence::FullDependence(Instruction *Source, Instruction *Destination,
bool FullDependence::isDirectionNegative() const {
for (unsigned Level = 1; Level <= Levels; ++Level) {
unsigned char Direction = DV[Level - 1].Direction;
- if (Direction == Dependence::DVEntry::EQ)
+ if (Direction == Dependence::DVEntry::EQ || isScalar(Level))
continue;
if (Direction == Dependence::DVEntry::GT ||
Direction == Dependence::DVEntry::GE)
diff --git a/llvm/test/Transforms/LoopInterchange/pr71519.ll b/llvm/test/Transforms/LoopInterchange/pr71519.ll
new file mode 100644
index 00000000000000..c364d89873a5dc
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/pr71519.ll
@@ -0,0 +1,64 @@
+; REQUIRES: asserts
+; RUN: opt < %s -passes=loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info \
+; RUN: -S -debug 2>&1 | FileCheck %s
+
+@aa = global [256 x [256 x float]] zeroinitializer, align 64
+@bb = global [256 x [256 x float]] zeroinitializer, align 64
+
+;; for (int nl = 0; nl < 10000000/256; nl++)
+;; for (int i = 0; i < 256; ++i)
+;; for (int j = 1; j < 256; j++)
+;; aa[j][i] = aa[j - 1][i] + bb[j][i];
+
+; CHECK: Before normalizing negative direction vectors:
+; CHECK: consistent anti [S 0 -1]!
+; CHECK: After normalizing negative direction vectors:
+; CHECK: consistent flow [S 0 1]!
+; CHECK: Negative dependence vector normalized.
+; CHECK: Found flow dependency between Src and Dst
+; CHECK: Src: %1 = load float, ptr %arrayidx10, align 4
+; CHECK: Dst: store float %add, ptr %arrayidx18, align 4
+; CHECK: Processing InnerLoopId = 2 and OuterLoopId = 1
+; CHECK: Loops interchanged.
+
+define float @s231() {
+entry:
+ br label %for.cond1.preheader
+
+; Loop:
+for.cond1.preheader: ; preds = %entry, %for.cond.cleanup3
+ %nl.036 = phi i32 [ 0, %entry ], [ %inc23, %for.cond.cleanup3 ]
+ br label %for.cond5.preheader
+
+for.cond.cleanup3: ; preds = %for.cond.cleanup7
+ %inc23 = add nuw nsw i32 %nl.036, 1
+ %exitcond41 = icmp ne i32 %inc23, 39062
+ br i1 %exitcond41, label %for.cond1.preheader, label %for.cond.cleanup
+
+for.cond.cleanup7: ; preds = %for.body8
+ %indvars.iv.next39 = add nuw nsw i64 %indvars.iv38, 1
+ %exitcond40 = icmp ne i64 %indvars.iv.next39, 256
+ br i1 %exitcond40, label %for.cond5.preheader, label %for.cond.cleanup3
+
+for.body8: ; preds = %for.cond5.preheader, %for.body8
+ %indvars.iv = phi i64 [ 1, %for.cond5.preheader ], [ %indvars.iv.next, %for.body8 ]
+ %0 = add nsw i64 %indvars.iv, -1
+ %arrayidx10 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %0, i64 %indvars.iv38
+ %1 = load float, ptr %arrayidx10, align 4
+ %arrayidx14 = getelementptr inbounds [256 x [256 x float]], ptr @bb, i64 0, i64 %indvars.iv, i64 %indvars.iv38
+ %2 = load float, ptr %arrayidx14, align 4
+ %add = fadd fast float %2, %1
+ %arrayidx18 = getelementptr inbounds [256 x [256 x float]], ptr @aa, i64 0, i64 %indvars.iv, i64 %indvars.iv38
+ store float %add, ptr %arrayidx18, align 4
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+ %exitcond = icmp ne i64 %indvars.iv.next, 256
+ br i1 %exitcond, label %for.body8, label %for.cond.cleanup7
+
+for.cond5.preheader: ; preds = %for.cond1.preheader, %for.cond.cleanup7
+ %indvars.iv38 = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next39, %for.cond.cleanup7 ]
+ br label %for.body8
+
+; Exit blocks
+for.cond.cleanup: ; preds = %for.cond.cleanup3
+ ret float undef
+}
|
|
I would need a more convincing argument than "it makes my case work". Suppose the scalar dependency is an implicit This example would not be reversed because there is no inner level but consider this to be the "either way goes" case. Possibly, |
bmahjour
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipping the scalar direction is incorrect. Scalar dependencies get carried from every iteration to every iteration, so they must be treated more conservatively than a = dependence.
I think this problem should be solved in the loop interchange pass itself where if the candidate pair of loops is nested inside other loops, and those outer loop IVs don't contribute to the index calculations in the pair of loops, then the dependence entries corresponding to the outer loops should be ignored.
Thanks for the case. That make sense to me. |
Thanks for pointing the direction. :-) Could you help me to check does the latest commit as you expected? |
I'm saying let's not modify DependenceAnalysis. The custom handling should be done in Loop Interchange to ignore dependencies at outer levels when they don't matter to the legality of interchange. |
I try to implement without modifying DependenceAnalysis and figure out that LoopInterchange rely on normalize() to enable few optimization opportunities. normalize() didn't happen because the outer loop is the initial value DVEntry::ALL. Setting non-constributed(IV didn't affect instructions) loops direction to NONE for each dependency by scanning the SCEV form of the pointers of Src and Dst instructions instead of initial value ALL seems generally correct for DependenceAnalysis. Or LoopInterchange should explore the way to legalize interchange without normalize? Thanks. |
What would the semantics of IMHO, how loop interchange deals with the "non-contributing" outer loops is an implementation issue inside loop-interchange itself. One solution might be to have its own normalize function, so you construct the dependence matrix, prune it (eg. by replacing the |
33e5c6f to
05a8a81
Compare
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Have created normalize function in interchange. Thanks for the advice. :-) |
The motivation is to introduce the custom functions for LoopInterchange.
This commit enables loop-interchange for the case in llvm#71519. With the loop-interchange, the case can be vectorized. for (int nl = 0; nl < 10000000/256; nl++) // Level 1 for (int i = 0; i < 256; ++i) // Level 2 for (int j = 1; j < 256; j++) // Level 3 aa[j][i] = aa[j - 1][i] + bb[j][i]; The case can't be interchanged without normalizaion. normalizaion didn't occur because the direction of level 1 loop dependence between aa[j][i] and aa[j - 1][i] is default value '*'. By scanning SCEV form of the pointer of aa[j][i] and aa[j - 1][i], the pass and determine the IV of loop 1(nl) didn't affect the value of aa[j][i] and aa[j - 1][i]. And then updating the direction of loop 1 to '=' to enable the normalization.
05a8a81 to
1e6fd79
Compare
|
Fix clang-format and rebase. |
| } | ||
| } | ||
|
|
||
| static bool normalize(std::vector<Dependence::DVEntry> &DV, ScalarEvolution *SE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am learning this transformation and came across your patch.
This function is sort of duplicate of FullDependece::normalize() but does not update the distance vector. Do you think we should avoid duplication? and updating distance vector is not necessary?
| static void dumpDirection(raw_ostream &OS, | ||
| std::vector<Dependence::DVEntry> &DV) { | ||
| OS << " ["; | ||
| for (unsigned II = 1; II <= DV.size(); ++II) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that we can start from 0 and avoid subtracting 1 each time we access DV but I understand that this is practice in DA.
This commit enables loop-interchange for the case in #71519.
With the loop-interchange, the case can be vectorized.
The case can't be interchanged without normalizaion.
normalizaion didn't occur because the direction of level 1 loop dependence between aa[j][i] and aa[j - 1][i] is default value '*'.
By scanning SCEV form of the pointer of aa[j][i] and aa[j - 1][i], the pass can determine the IV of loop 1(nl) didn't affect the value of aa[j][i] and aa[j - 1][i]. And then updating the direction of loop 1 to '=' to enable the normalization.